A Two-Level Morphological Analyser for the Indonesian Language
نویسندگان
چکیده
This paper presents our efforts at developing an Indonesian morphological analyser that provides a detailed analysis of the rich affixation process. We model Indonesian morphology using a two-level morphology approach, decomposing the process into a set of morphotactic and morphophonemic rules. These rules are modelled as a network of finite state transducers and implemented using xfst and lexc. Our approach is able to handle reduplication, a non-concatenative morphological process.
منابع مشابه
Word classes in Indonesian: A linguistic reality or a convenient fallacy in natural language processing?
This paper looks at Indonesian (Bahasa Indonesia), and the claim that there is no noun-verb distinction within the language as it is spoken in regions such as Riau and Jakarta. We test this claim for the language as it is written by a variety of Indonesian speakers using empirical methods traditionally used in part-of-speech induction. In this study we use only morphological patterns that we ge...
متن کاملIndonesian Morphology Tool (MorphInd): Towards an Indonesian Corpus
This paper describes a robust finite state morphology tool for Indonesian (MorphInd), which handles both morphological analysis and lemmatization for a given surface word form so that it is suitable for further language processing. MorphInd has wider coverage on handling Indonesian derivational and inflectional morphology compared to an existing Indonesian morphological analyzer [1], along with...
متن کاملNANYANG TECHNOLOGICAL UNIVERSITY SCHOOL OF HUMANITIES AND SOCIAL SCIENCES Creating derivational morphology links in Wordnet Bahasa
Derivational morphology links are created for the Wordnet Bahasa, a combined Indonesian and Malay online lexical dictionary (Nurril Hirfana, Suerya, & Bond, 2011). The focus was to link root words to affixed words as affixation is one of the more apparent word formation processes in Bahasa Melayu. MorphInd, an Indonesian morphological analyser (Larasati, Kubon, & Zeman, 2011), is used to breakd...
متن کاملTowards an Indonesian-English SMT System: A Case Study of an Under-Studied and Under-Resourced Language, Indonesian
This paper describes a work on preparing an Indonesian-English Statistical Machine Translation (SMT) System. It includes the creation of Indonesian morphological analyzer, MorphInd, and the composing of an Indonesian-English parallel corpus, IDENTIC. We build an SMT system using the state-of-the-art phrase-based SMT system, MOSES. We show several scenarios where the morphological tool is used t...
متن کاملSyntactic Underspecification in Riau Indonesian
Indonesian is known for having a relatively simple morphological and syntactic structure. This is especially true of local varieties of the language, where contrast between categories found in Standard Indonesian is neutralized. In the Indonesian variety spoken in Riau Province, there is almost no morphological marking of grammatical categories and there is relatively free word order. Gil (1994...
متن کامل